Getting started with R

From the very basics

Carina Nigg, Judith Bouman

Introduction

Disclaimer

This is an introduction course and we expect no prior knowledge

Adjust your speed to your own level

Ask questions at any time

Tell us if you are bored or overwhelmed

Course content

Introduction of R and RStudio

  • Understanding R and RStudio
  • Basic functions
  • Write a first script
  • Understanding packages
  • Import data
  • Basic functions on imported data

Analyzing your own data

  • Organize your data
  • Load your data into R
  • Provide overview of your data
  • Inspect missing data
  • Check plausibility of your data

The program

Time Topic
27th 9.00 - 9.30 R and RStudio, why?
27th 9.00 - 9.30
27th 9.00 - 9.30

Final exercise

  • Choose your own dataset
  • Organize your data folder
  • Read data into R environment using an r script
  • Select 4 variables (at least one numerical and one categorical)
  • Find out what classes your variables have and correct them to the most sensible class if necessary
  • Check for missing values for all variables
  • Give summary statistics
  • For vector/categorical variables give table with count and percentages
  • For numerical variables give a table with mean/median, sd, min, max
  • Visualize
  • For numerical: histogram/boxplot
  • For categorical: bar

Grading: Pass or fail (0.5 ECT)

Why learn R?

  • Reproducibility of your results
  • Free software (unlike stata or SPSS)
  • A lot of resources available
  • Potentially useful in further career

Installing R and RStudio

Did you all manage to install R and RStudio?

Difference R and RStudio

Opening RStudio

Basic concepts in R

Simple calculator

1 + 1 
[1] 2

Using a script

  • Track code
  • Adjust code
  • Repeat code

Now, we can download, save and open the “follow_along_script_monday_morning.R”

Saving “objects”

a = 1 
b = 2
c = a + b

# can use = or "<-"

c <- a + b 

c
[1] 3

Vector

c(1,2,3)
[1] 1 2 3
c(a, b)
[1] 1 2

Vector

How to access an element in the vector

a_vector = c(1.23, 2.34, 6.21, 3.11, 3.412, 4.32, 5.922, 5.65)

a_vector[4]
[1] 3.11

Array/matrix

matrix(data = c(1,2,3,4,5,6,7,8), nrow = 4 )
     [,1] [,2]
[1,]    1    5
[2,]    2    6
[3,]    3    7
[4,]    4    8

Array/matrix

How to access an element from a matrix

a_matrix = matrix(data = c(1,2,3,4,5,6,7,8), nrow = 4 )

a_matrix[3,2] # first row, then column
[1] 7

Using “functions”

sum(1, 2)
[1] 3
sum(a , b)
[1] 3
?sum

Getting help with functions

?sum

Let`s try!

Can you calculate the following for “a_vector”?

  • Mean
  • Standard deviation
  • Maximal value
  • Minimal value
  • Length of the vector
a_vector = c(1.23, 2.34, 6.21, 3.11, 3.412, 4.32, 5.922, 5.65)

Let`s try! - Solution

a_vector = c(1.23, 2.34, 6.21, 3.11, 3.412, 4.32, 5.922, 5.65)

mean(a_vector)
[1] 4.02425
sd(a_vector)
[1] 1.811263
max(a_vector)
[1] 6.21
min(a_vector)
[1] 1.23
length(a_vector)
[1] 8

Take 15 minutes to try

Classes and types

Numeric and character

a_vector = c(1.23, 2.34, 6.21, 3.11, 3.412, 4.32, 5.922, 5.65)

class(a_vector)
[1] "numeric"
b_vector = c("something", "something else", "another thing", "completely differnt")

class(b_vector)
[1] "character"

Classes and types

Logical

c_vector = c(F, T, T,T, F, T, F, T)

class(c_vector)
[1] "logical"

Dataframe

# Create a data frame
df <- data.frame(x = 1:3, y = c("a", "b", "c"))

# Printing
print(df)
  x y
1 1 a
2 2 b
3 3 c

Tibble

# Create a tibble
library(tibble)
tb <- tibble(x = 1:3, y = c("a", "b", "c"))

print(tb)
# A tibble: 3 × 2
      x y    
  <int> <chr>
1     1 a    
2     2 b    
3     3 c    

Packages

Get access to specific set of functions

#install.packages("tidyverse")
library(tidyverse)

run “library()” every time you want to use any function from this package

Let’s try!

Get your data in R

Organize your data

project_dir

Organize your data

Choose your working directory

getwd() 
[1] "/Users/jb22m516/Documents/GitHub/getting_started_with_R/slides"
setwd("/Users/jb22m516/Documents/GitHub/getting_started_with_R/")

“Path” to file in R

A “path” tells R where it can find your data

Functions for loading data

#install.packages("utils")
library(utils)

#data = read.csv2()

Types of data in R

tibble / dataframe / matrix / array